Sanity check for trained model¶

  • load a saved model
  • make a tiny test dataset with 10 examples from each class
  • make predictions on these test examples
  • display the spectrograms, audio and model prediction for each of the examples.

This report shows the model's prediction vs. ground truth for a few examples. In the examples, the Olive Sided Flycatcher (OSFL) is either present or absent

  • Present - The 3s clip of audio contains a complete human labelled tag of the OSFL.
  • Absent - The example is audio taken from before the start of a human labelled OSFL tag.

The scores are made on a validation set, which is defined as

  • The unseen audio from 20% of the ARU locations in the training set. The model has not been trained on audio from these locations.

There is some further processing which need to be applied to this dataset:¶

  • Mix in audio from other times of day, and other habitats. This is to ensure that the training data contains as much variety as possible.
  • Replace the human labelled audio samples with high scoring ones picked out by HawkEars model - these should end up all being focal recordings. This is assumed to produce a relationsip between sound power and recognizer score. This will enable density estimation and other downstream statistical applications.
  • bandpass the input signal to remove sounds from frequencies outside the OSFL vocalization range
  • Go through the audio samples in the validation set and remove any which are obviously labelled incorrectly, and flag those which are borderline.
In [ ]:
# imports
from pathlib import Path
import pandas as pd
import sys
BASE_PATH = Path.cwd().parent.parent
sys.path.append(str(BASE_PATH))
import pandas as pd
import opensoundscape as opso
from opensoundscape import Audio, Spectrogram
/Users/mikeg/miniforge3/envs/osfl2/lib/python3.10/site-packages/opensoundscape/ml/cnn.py:18: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm
In [ ]:
# Load the validation set
train_valid_set_path = BASE_PATH / 'data' / 'interim' / 'train_and_valid_set' 
valid_set = pd.read_pickle(train_valid_set_path / 'valid_ds_sample_size_0.1.pkl')
valid_set.sample(5)
Out[ ]:
target_presence target_absence
file start_time end_time
../../data/raw/recordings/OSFL/recording-816784.flac 12.0 15.0 0.0 1.0
../../data/raw/recordings/OSFL/recording-291508.mp3 96.0 99.0 0.0 1.0
../../data/raw/recordings/OSFL/recording-815818.flac 15.0 18.0 0.0 1.0
../../data/raw/recordings/OSFL/recording-481585.flac 169.5 172.5 0.0 1.0
../../data/raw/recordings/OSFL/recording-291730.mp3 220.5 223.5 0.0 1.0
In [ ]:
# Take a sample of 10 from each class
present_samples = valid_set.loc[valid_set['target_presence']==1].sample(10)
absent_samples = valid_set.loc[valid_set['target_absence']==1].sample(10)
present_samples.index[0]
Out[ ]:
(PosixPath('../../data/raw/recordings/OSFL/recording-291576.mp3'), 0.0, 3.0)
In [ ]:
# Load a trained model. 
model = opso.cnn.load_model(BASE_PATH / 'models' / 'best.model')
model.valid_metrics
Out[ ]:
{0: {'confusion_matrix': array([[ 914,   69],
         [8978, 1541]]),
  'precision': 0.9571428571428572,
  'recall': 0.1464968152866242,
  'f1': 0.2541017396322862,
  'jaccard': 0.11864998939763832,
  'hamming_loss': 0.7865588593288124},
 1: {'confusion_matrix': array([[ 948,   35],
         [9303, 1216]]),
  'precision': 0.9720223820943246,
  'recall': 0.11560034223785531,
  'f1': 0.20662701784197113,
  'jaccard': 0.10369054294846008,
  'hamming_loss': 0.8118588071639715},
 2: {'confusion_matrix': array([[ 914,   69],
         [6383, 4136]]),
  'precision': 0.9835909631391201,
  'recall': 0.3931932693221789,
  'f1': 0.5618038576473784,
  'jaccard': 0.2573572651932767,
  'hamming_loss': 0.5609459224482699},
 3: {'confusion_matrix': array([[ 890,   93],
         [4288, 6231]]),
  'precision': 0.9852941176470589,
  'recall': 0.5923566878980892,
  'f1': 0.7398919432405153,
  'jaccard': 0.37800694445487304,
  'hamming_loss': 0.3808902799513128},
 4: {'confusion_matrix': array([[ 813,  170],
         [1501, 9018]]),
  'precision': 0.9814976055724859,
  'recall': 0.8573058275501474,
  'f1': 0.9152077941848075,
  'jaccard': 0.5854828748503473,
  'hamming_loss': 0.1452790818988002}}
In [ ]:
model.predict(valid_set, batch_size=32)
  0%|          | 0/51 [00:00<?, ?it/s]
Out[ ]:
target_presence target_absence
file start_time end_time
../../data/raw/recordings/OSFL/recording-4819.mp3 0.0 3.0 -1.017535 1.184056
1.5 4.5 -2.782332 2.765101
3.0 6.0 -1.430539 1.502954
4.5 7.5 -1.984744 1.893654
6.0 9.0 -2.831836 2.863616
... ... ... ... ...
../../data/raw/recordings/OSFL/recording-826279.flac 4.5 7.5 -3.197888 3.422475
7.5 10.5 -0.686746 1.046495
../../data/raw/recordings/OSFL/recording-826374.flac 0.0 3.0 -3.158400 3.226395
3.0 6.0 -2.232522 2.313747
15.0 18.0 6.799380 -7.086478

1617 rows × 2 columns

In [ ]:
present_preds = model.predict(present_samples, activation_layer='sigmoid')
  0%|          | 0/10 [00:00<?, ?it/s]
In [ ]:
absent_preds = model.predict(absent_samples, activation_layer='sigmoid')
  0%|          | 0/10 [00:00<?, ?it/s]
In [ ]:
# rename columns for better clarity after they're combined
present_samples.rename(columns = {'target_presence':'present_label', 'target_absence':'absent_label'}, inplace = True) 
present_preds.rename(columns = {'target_presence':'present_pred', 'target_absence':'absent_pred'}, inplace = True)
absent_samples.rename(columns = {'target_presence':'present_label', 'target_absence':'absent_label'}, inplace = True)
absent_preds.rename(columns = {'target_presence':'present_pred', 'target_absence':'absent_pred'}, inplace = True)

# combine labels and predictions for samples of present and absent classes
present_labels_and_preds = pd.concat([present_samples, present_preds], axis=1)
absent_labels_and_preds = pd.concat([absent_samples, absent_preds], axis=1)
combined = pd.concat([present_labels_and_preds, absent_labels_and_preds], axis=0)
combined
Out[ ]:
present_label absent_label present_pred absent_pred
file start_time end_time
../../data/raw/recordings/OSFL/recording-291576.mp3 0.0 3.0 1.0 0.0 0.995771 0.002882
../../data/raw/recordings/OSFL/recording-291508.mp3 295.5 298.5 1.0 0.0 0.990465 0.007913
../../data/raw/recordings/OSFL/recording-552659.flac 13.5 16.5 1.0 0.0 0.882287 0.120558
../../data/raw/recordings/OSFL/recording-292300.mp3 3.0 6.0 1.0 0.0 0.741774 0.274831
../../data/raw/recordings/OSFL/recording-553501.flac 177.0 180.0 1.0 0.0 0.526428 0.483489
../../data/raw/recordings/OSFL/recording-292035.mp3 28.5 31.5 1.0 0.0 0.999022 0.000715
../../data/raw/recordings/OSFL/recording-294423.mp3 3.0 6.0 1.0 0.0 0.999152 0.000562
../../data/raw/recordings/OSFL/recording-295299.mp3 18.0 21.0 1.0 0.0 0.997536 0.001722
../../data/raw/recordings/OSFL/recording-292249.mp3 6.0 9.0 1.0 0.0 0.995068 0.002990
../../data/raw/recordings/OSFL/recording-104311.mp3 130.5 133.5 1.0 0.0 0.998575 0.001363
../../data/raw/recordings/OSFL/recording-554028.flac 10.5 13.5 0.0 1.0 0.015250 0.984797
../../data/raw/recordings/OSFL/recording-291508.mp3 217.5 220.5 0.0 1.0 0.169071 0.816377
../../data/raw/recordings/OSFL/recording-481585.flac 97.5 100.5 0.0 1.0 0.562846 0.408219
../../data/raw/recordings/OSFL/recording-296785.mp3 22.5 25.5 0.0 1.0 0.049864 0.950873
../../data/raw/recordings/OSFL/recording-553491.flac 99.0 102.0 0.0 1.0 0.074578 0.927464
../../data/raw/recordings/OSFL/recording-291730.mp3 69.0 72.0 0.0 1.0 0.015597 0.985294
../../data/raw/recordings/OSFL/recording-292071.mp3 43.5 46.5 0.0 1.0 0.057902 0.951844
../../data/raw/recordings/OSFL/recording-554028.flac 19.5 22.5 0.0 1.0 0.667904 0.336344
../../data/raw/recordings/OSFL/recording-553501.flac 34.5 37.5 0.0 1.0 0.074124 0.937287
../../data/raw/recordings/OSFL/recording-815882.flac 36.0 39.0 0.0 1.0 0.047265 0.958468
In [ ]:
def show_example(counter):
    path, offset, end_time = combined.index[counter]
    duration = end_time - offset
    audio = Audio.from_file(path, offset=offset, duration=duration)
    spectrogram = Spectrogram.from_audio(audio)
    print(path, offset, end_time)
    print(audio.metadata)
    print(f"Present Prediction = {combined.iloc[counter].present_pred} \nActual = {combined.iloc[counter].present_label}")
    print("Check below")
    audio.show_widget()
    spectrogram.plot()
    counter += 1
    return counter

Show Predictions¶

See example labels, predictions, audio and spectrograms below

In [ ]:
next_example_idx = 0
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291576.mp3 0.0 3.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9957714676856995 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291508.mp3 295.5 298.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9904650449752808 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-552659.flac 13.5 16.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 13229824, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.8822866082191467 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292300.mp3 3.0 6.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 7957844, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.7417735457420349 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-553501.flac 177.0 180.0
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.5264281034469604 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292035.mp3 28.5 31.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9990221261978149 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-294423.mp3 3.0 6.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 2653384, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9991520643234253 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-295299.mp3 18.0 21.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 2653384, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9975355863571167 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292249.mp3 6.0 9.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 7957844, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.995067834854126 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-104311.mp3 130.5 133.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 7937792, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9985754489898682 
Actual = 1.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-554028.flac 10.5 13.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.015250089578330517 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291508.mp3 217.5 220.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.16907060146331787 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-481585.flac 97.5 100.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 13230000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.5628458857536316 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-296785.mp3 22.5 25.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 2653384, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.049863532185554504 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-553491.flac 99.0 102.0
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.07457751035690308 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291730.mp3 69.0 72.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.015596892684698105 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292071.mp3 43.5 46.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.05790164694190025 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-554028.flac 19.5 22.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.6679044961929321 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-553501.flac 34.5 37.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.07412396371364594 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-815882.flac 36.0 39.0
{'comment': 'Processed by SoX', 'samplerate': 32000, 'format': 'FLAC', 'frames': 19200000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.04726511985063553 
Actual = 0.0
Check below
Your browser does not support the audio element.
No description has been provided for this image

Further work for this notebook:

  1. show confusion matrix
  2. plot the examples with the highest losses.